Row1.1

Assembly source

Current data includes 454 assemblies for Pectobacterium genus downloaded from NCBI Assembly database and internal assemblies provided by the collaborators. Metadata for all NCBI assemblies was downloaded as XML from Assembly and BioSample databases using Eutils tools provided by NCBI. Additionally, linked BioSample metadata was fetched from NCBI and this combined data is summarized in the figures below.

Row1.2

Assembly source bar plot

Row2

NCBI internal QC summary

NCBI performs internal QC on the genome assemblies submitted by the users. A genome assembly can be excluded from NCBI because of multiple possible reasons (see here for details). For the current dataset, 0 genome assemblies are with know issues. Sunburst chart in the second column of current row shows the assembly counts which were flagged by NCBI for one of the QC metric mentioned below. (innermost to outermost order):

  • ExclFromRefSeq: Whether assembly was excluded from the RefSeq
  • Anomalous: If an assembly was detected as anomalous
  • Replaced: Whether the assembly was replaced with an updated accession

Additionally, NCBI also performs the taxonomy validation for prokaryote genomes. It uses Average Nucleotide Identity (ANI) scores to verify the declared species for any genome submitted. The details about the method are described in Cuifo et al 2018. Bar chart in the third column of current row shows the statistics for different taxonomy check status.

Sunburst chart

NCBI assembly taxonomy check status

Row2.5

Type strains summary

A species can have multiple type strain genomes available in NCBI. Following table summarizes all the type strains for Pectobacterium genus available in the NCBI Assembly database.

Row2.5

Type strains

Row3

Number of genomes per species

Row4

Geographical location of the sequenced species

Row5

Host

Isolation source: (Top 25)

Environmental medium

Row6.1

Following plots show the quantitative data such as N50, L50, contig counts and BUSCO score for genomes.

Row6.2

N-contigs

N-contigs (filtered)

Row7

N50

L50

Row8

#contigs vs N50

#contigs vs L50

Row9

BUSCO genome and protein distribution

BUSCO(protein) vs BUSCO(genome)

Row10

N50 vs complete BUSCO(Genome)

N50 vs complete BUSCO(Protein)

Row11

L50 vs complete BUSCO(Genome)

N50 vs complete BUSCO(Protein)